Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip Triton import for AMD #5110

Merged
merged 4 commits into from
Feb 9, 2024
Merged

Skip Triton import for AMD #5110

merged 4 commits into from
Feb 9, 2024

Conversation

lekurile
Copy link
Contributor

@lekurile lekurile commented Feb 9, 2024

When testing DeepSpeed inference on an AMD Instinct MI250X/MI250 GPU, the pytorch-triton-rocm module would break the torch.cuda device API. To address this, importing triton is skipped when the GPU is determined to be AMD.

This change allows DeepSpeed to be executed on an AMD GPU w/o kernel injection in the DeepSpeedExamples text-generation example using the following command:

deepspeed --num_gpus 1 inference-test.py --model facebook/opt-125m

TODO: Root-cause the interaction between pytorch-triton-rocm and DeepSpeed to understand why this is causing the torch.cuda device API to break.

deepspeed/__init__.py Outdated Show resolved Hide resolved
@mrwyattii mrwyattii merged commit d04a838 into master Feb 9, 2024
15 checks passed
@mrwyattii mrwyattii deleted the lekurile/disable_triton_amd branch February 9, 2024 22:44
mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this pull request Feb 17, 2024
When testing DeepSpeed inference on an `AMD Instinct MI250X/MI250` GPU,
the `pytorch-triton-rocm` module would break the `torch.cuda` device
API. To address this, importing `triton` is skipped when the GPU is
determined to be `AMD`.

This change allows DeepSpeed to be executed on an AMD GPU w/o kernel
injection in the DeepSpeedExamples [text-generation
example](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/text-generation)
using the following command:
```bash
deepspeed --num_gpus 1 inference-test.py --model facebook/opt-125m
```

TODO: Root-cause the interaction between `pytorch-triton-rocm` and
DeepSpeed to understand why this is causing the `torch.cuda` device API
to break.
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
When testing DeepSpeed inference on an `AMD Instinct MI250X/MI250` GPU,
the `pytorch-triton-rocm` module would break the `torch.cuda` device
API. To address this, importing `triton` is skipped when the GPU is
determined to be `AMD`.

This change allows DeepSpeed to be executed on an AMD GPU w/o kernel
injection in the DeepSpeedExamples [text-generation
example](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/text-generation)
using the following command:
```bash
deepspeed --num_gpus 1 inference-test.py --model facebook/opt-125m
```

TODO: Root-cause the interaction between `pytorch-triton-rocm` and
DeepSpeed to understand why this is causing the `torch.cuda` device API
to break.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants